Skip to content

xtensa: support for C library provided by Xtensa toolchain #90351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

williamtcdns
Copy link
Collaborator

This change add an additional C Library Implementation provided by Xtensa toolchain.

Copy link
Collaborator

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes and nitpicks. Do check the interrupt return path though, pretty sure that one's a bug.

This is something we've needed for a while, SOF targets generally build with the toolchain libc already but do it ad hoc and without OS support.

I do worry that with the need for all that synchronization (and maybe TLS?) support, that there's a lack of testing for this on an SMP platform. Is there a way to validate this on an existing qemu device somehow, maybe with a stub? Otherwise the Intel DSPs are going to end up being guinea pigs for upstream.

#if defined(CONFIG_XTENSA_LIBC)
rsr a5, PRID
bnez a5, 1f
CALL _init_reent_bss
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that some (actually most) xtensa SOCs don't come through this code, having their own bootstrap path (often coming out of e.g. bootloader ROMs that have already done the hardware setup). If this has complicated things to do we should consider adding a test somewhere to catch platforms that mess it up.

Alternatively, if this is something that isn't required for CPU0 (looks like it isn't) consider moving to arch_kernel_init(), which runs on a cleanly-initialized CPU capable of running C code. (Though I guess this isn't a windowed call so would still need to be done in inline assembly)

Finally, my memory is that PRID is not guaranteed to be equal to zero for the boot CPU. On esp32, my memory (which is fuzzy, Espressif folks will want to comment) is that both cores had non-zero/not-even-small-integer PRID values. This may not work the way you want, which would require putting this code on the SMP bringup path.

Also: um... what exactly does "init bss" even mean? Is .bss not expected to be zero-filled?!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, correcting typo/ambiguity: "isn't required for CPU1+". That is, this looks like code you want to run on the boot CPU only, after startup but before any subsystem code that might touch the libc runs. That's exactly what arch_kernel_init() is for.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I successfully verified moving the _init_reent_bss() call to arch_kernel_init(), but in an SMP SoC is arch_kernel_init() meant to run only on the boot CPU ? if not, is there an existing CONFIG_ macro which has the ID of the boot CPU ?

_init_reent_bss() is a bit of misnomer; _init_reent_bss() is to be called to initialize the single global copy of struct _reent state for single-threaded executables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest update of this PR, calling _init_reent_bss() is no longer needed.
__DYNAMIC_REENT__ support in xclib is used whereby the _reent struct is retrieved using __getreent().

@andyross Is arch_kernel_init() guaranteed to run only on the boot CPU ?
if not, is there an existing CONFIG_ macro which has the ID of the boot CPU ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR no longer modifies the file arch/xtensa/include/kernel_arch_func.h which holds arch_kernel_init().

But I would appreciate a response on whether arch_kernel_init() is guaranteed to run only on the boot CPU.

#if defined(CONFIG_XTENSA_LIBC) && defined(CONFIG_MULTITHREADING)
rsr a6, ZSR_CPU
l32i a6, a6, ___cpu_t_current_OFFSET
call4 _xclib_update_reent_ptr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems awkward to do this in assembly in the darkest center of the context switch. It's just a C function call, why not add e.g. _xclib_update_reent_ptr(_current); to arch_switch(), which is the inline wrapper around this. It's currently degenerate but at some point in the past had C code in it.

Also unless I'm missing it, this needs to be added to the context switch path on interrupt return too, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually reading that, I think the argument wants to be &_current->reent and not the bare thread pointer, right? The libc obviously doesn't know the layout of a zephyr thread, it wants that struct _reent I assume?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks just like newlib; can it use the same bits as that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross correct I need the reent structure address of the current thread to update the global struct _reent *_reent_ptr used by xclib for threadsafety.

@keith-packard please could you point me to the sources for newlib ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newlib integration bits for Zephyr are in lib/libc/newlib; you can either copy all of that and adjust to suit or try to share the code between newlib and the Xtensa toolchain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keith-packard The integration bits do not seem to implement anything for managing the current struct _reent.
I would have expected to see __DYNAMIC_REENT or _REENT_THREAD_LOCAL set.

Copy link
Collaborator Author

@williamtcdns williamtcdns Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest update of this PR, __DYNAMIC_REENT__ support in xclib is used whereby the _reent struct is retrieved using __getreent().
It is no longer needed to do a _reent pointer update during context switch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keith-packard @andyross To further test SMP thread safety, I am using tinyraytracer which does ray-tracing and render on the terminal:
https://github.com/BrunoLevy/TinyPrograms/blob/main/tinyraytracer.c

Each pixel is computed independently, so I saw an opportunity to use it as a multi-threading test.

I ported it to Zephyr, where as many thread as there are cores available get created:
williamtcdns@35d41e30

Both Xtensa C library and PICOLIBC successfully complete the rendering for a single core:
image

However, with more than one cores (ie: 4 cores in my tests), PICOLIBC fails, while I somewhat (noise in the rendering being debugged) get a complete rendering with Xtensa C library because it manages the current struct _reent:
image

If the above Zephyr port of tinyraytracer sounds like an interesting SMP thread safety test, I would be more than happy to submit it as a sample test.

set_property(TARGET linker PROPERTY cpp_base -lstdc++11)
else()
set_property(TARGET linker PROPERTY cpp_base -lstdc++)
endif()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to this PR, but note that once this lands SOF will want to use it instead of its current integration for a handful of third party C++ modules, which add the libraries manually. (@lyakh @kv2019i et. al.)

_Initlocks();
return 0;
}
SYS_INIT(xclib_Initlocks, APPLICATION, CONFIG_KERNEL_INIT_PRIORITY_DEFAULT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use a SYS_INIT() hook for arch code, it wastes two words for the record and however many instructions it takes for that wrapper that adds a needless return value. Arch code has copious spots to put initialization code in before calling z_cstart(), c.f. the arch_kernel_init() suggestion above.

Also, note that the "APPLICATION" init level is very late, and there may be driver/subsystem code from many sources (including out-of-the-zephyr-tree) that relies on libc behavior and whatever _Initlocks() needed to do. If for some reason you do need this to be a SYS_INIT() hook, I'm guessing you want it to be PRE_KERNEL.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I also successfully verified moving the _Initlocks() call to arch_kernel_init().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest update of this PR, calling _Initlocks() is no longer needed.
In fact, this PR no longer modifies the file arch/xtensa/include/kernel_arch_func.h which holds arch_kernel_init().

}
SYS_INIT(xclib_Initlocks, APPLICATION, CONFIG_KERNEL_INIT_PRIORITY_DEFAULT);

typedef struct k_mutex *_Rmtx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pet peeve: you probably want a spinlock or a k_sem here and not a mutex. k_mutex is very heavyweight and the only feature it brings is priority inheritance. Semaphores are a better general purpose blocking IPC lock for almost all purposes. But given the use case, I'm thinking that the libc is probably using this to protect tight critical sections, where spinlocks are much lighter still and objectively better choices.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially used k_sem, but a recursive locking mechanism is needed to avoid hangs.

@williamtcdns
Copy link
Collaborator Author

Notes and nitpicks. Do check the interrupt return path though, pretty sure that one's a bug.

This is something we've needed for a while, SOF targets generally build with the toolchain libc already but do it ad hoc and without OS support.

I do worry that with the need for all that synchronization (and maybe TLS?) support, that there's a lack of testing for this on an SMP platform. Is there a way to validate this on an existing qemu device somehow, maybe with a stub? Otherwise the Intel DSPs are going to end up being guinea pigs for upstream.

Thank you for all the feedbacks.

Regarding the interrupt return path, I am assuming that xtensa_switch() is single point where context switch happens, regardless of whether in the interrupt return path.

Xtensa Tools RJ-2025.5 has been released, which includes a 2 cores config called hifi5s_2c_l2, and a 4 cores config called hifi5s_4c_l2.
Their initial Zephyr ports are available at:
https://github.com/foss-xtensa/zephyr/commits/hifi5s_2c_l2
https://github.com/foss-xtensa/zephyr/commits/hifi5s_4c_l2

Until I complete the support for using the same xt-sim board with a selected Xtensa config in the environment variable XTENSA_CORE, separate patch sets are used.

I am using those 2 SMP configs to validate this Xtensa toolchain C library support.

@williamtcdns williamtcdns force-pushed the xclib_support branch 2 times, most recently from ebdeba9 to b1d3e55 Compare May 27, 2025 18:09
@dcpleung
Copy link
Member

Also, shouldn't the _reent_ptr be per-CPU?

@williamtcdns
Copy link
Collaborator Author

Also, shouldn't the _reent_ptr be per-CPU?

Yes, _reent_ptr is per-CPU, and declared as following:
struct _reent * _reent_ptr attribute ((section (".clib.percpu.bss")));

Where the linker puts it in .clib.percpu.bss section, which ends up in per-CPU local memory.

It makes per-CPU local memory a requirement.

@williamtcdns williamtcdns force-pushed the xclib_support branch 3 times, most recently from b36ab5c to 6945f41 Compare June 3, 2025 17:09
@williamtcdns
Copy link
Collaborator Author

williamtcdns commented Jun 3, 2025

Also, shouldn't the _reent_ptr be per-CPU?

Yes, _reent_ptr is per-CPU, and declared as following: struct _reent * _reent_ptr attribute ((section (".clib.percpu.bss")));

Where the linker puts it in .clib.percpu.bss section, which ends up in per-CPU local memory.

It makes per-CPU local memory a requirement.

With the latest update of this PR, __DYNAMIC_REENT__ support in xclib is used whereby the _reent struct is retrieved using __getreent().
It removes the per-CPU local memory requirement.

@williamtcdns williamtcdns force-pushed the xclib_support branch 2 times, most recently from a8f3156 to 3a8ca08 Compare June 10, 2025 20:01
This change add an additional C library implementation
provided by Xtensa toolchain.

Signed-off-by: William Tambe <williamt@cadence.com>
Changes needed to build C++ code using Xtensa toolchain.

Signed-off-by: William Tambe <williamt@cadence.com>
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants